Analysis of the Relationships among Longest Common Subsequences, Shortest Common Supersequences and Patterns and its application on Pattern Discovery in Biological Sequences
نویسندگان
چکیده
For a set of multiple sequences, their patterns, Longest Common Subsequences (LCS) and Shortest Common Supersequences (SCS) represent different aspects of these sequences' profile. Revealing the relationship between the patterns and LCS/SCS might provide us with a deeper view of the patterns. In this paper, we have showed that patterns LCS and SCS were closely related to each other. Based on their relations, the PALS algorithms are proposed to discover patterns in a set of biological sequences based on LCS and SCS results. Experiments show that the PALS algorithms are superior in efficiency and accuracy on a variety of sequences.
منابع مشابه
Mining Biological Repetitive Sequences Using Support Vector Machines and Fuzzy SVM
Structural repetitive subsequences are most important portion of biological sequences, which play crucial roles on corresponding sequence’s fold and functionality. Biggest class of the repetitive subsequences is “Transposable Elements” which has its own sub-classes upon contexts’ structures. Many researches have been performed to criticality determine the structure and function of repetitiv...
متن کاملProblems Related to Subsequences and Supersequences
We present an algorithm for building the automaton that searches for all non-overlapping occurrences of each subsequence from the set of subsequences. Further, we define Directed Acyclic Supersequence Graph and use it to solve the generalized Shortest Common Supersequence problem, the Longest Common Non-Supersequence problem, and the Longest Consistent Supersequence problem.
متن کاملCommon Subsequences and Supersequences and Their expected Length
Let f(n; k; l) be the expected length of a longest common subse-quence of l sequences of length n over an alphabet of size k. It is known that there are constants (l) k such that f(n; k; l) ! (l) k n, we show that (l) k = (k 1=l?1). Bounds for the corresponding constants for the expected length of a shortest common supersequence are also presented.
متن کاملDISCOVERY of LONGEST INCREASING SUBSEQUENCES and its VARIANTS using DNA OPERATIONS
The Longest Increasing Subsequence (LIS) and Common Longest Increasing Subsequence (CLIS) have their importance in many data mining applications. We propose algorithms to discover LIS and CLIS from varied databases. This work finds all increasing subsequences from the given database, find increasing subsequences in n sliding window, longest increasing sequences in one and more sequences, decrea...
متن کاملOn the Approximation of Shortest Common Supersequences and Longest Common Subsequences
The problems of finding shortest common supersequences (SCS) and longest common subsequences (LCS) are two well-known NP-hard problems that have applications in many areas, including computational molecular biology, data compression, robot motion planning, and scheduling, text editing, etc. A lot of fruitless effort has been spent in searching for good approximation algorithms for these problem...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- International journal of data mining and bioinformatics
دوره 5 6 شماره
صفحات -
تاریخ انتشار 2011